Developing Academic English Language 
Proficiency Prototypes for 5 lh Grade Reading: 
Psychometric and Linguistic Profiles of Tasks 
An Extended Executive Summary 

CSE Report 720 

Alison L. Bailey, Becky H. Huang, Hye Won Shin, and Tim Farnsworth 

National Center for Research on Evaluation, 

Standards, and Student Testing (CRESST) 

University of California, Los Angeles 

Frances A. Butler 
Language Testing Consultant 



June 2007 



Center for the Study of Evaluation 
National Center for Research on Evaluation, 
Standards, and Student Testing 
Graduate School of Education & Information Studies 
University of California, Los Angeles 
Los Angeles, CA 90095-1522 
(310) 206-1532 




Copyright © 2007 The Regents of the University of California 

The work reported herein was supported under the Educational Research and Development Centers 
Program, PR/ Award Number R305B960002, as administered by the Institute of Education Sciences, 
U.S. Department of Education. 

The findings and opinions expressed in this report do not reflect the positions or policies of the 
National Institute on Student Achievement, Curriculum, and Assessment, the Institute of Education 
Sciences, or the U.S. Department of Education. 




DEVELOPING ACADEMIC ENGLISH LANGUAGE PROFICIENCY 
PROTOTYPES FOR 5™ GRADE READING: PSYCHOMETRIC AND 
LINGUISTIC PROFILES OF TASKS 
AN EXTENDED EXECUTIVE SUMMARY 1 

Alison L. Bailey, Becky H. Huang, Hye Won Shin, and Tim Farnsworth 
National Center for Research on Evaluation, Standards, and Student Testing 
University of California, Los Angeles 

Frances A. Butler 
Language Testing Consultant 

Abstract 

Within an evidentiary framework for operationally defining academic English language 
proficiency (AELP), linguistic analyses of standards, classroom discourse, and textbooks 
have led to specifications for assessment of AELP. The test development process 
described here is novel due to the emphasis on using linguistic profiles to inform the 
creation of test specifications and guide the writing of draft tasks. In this report, we 
outline the test development process we have adopted and provide the results of studies 
designed to turn the drafted tasks into illustrative prototypes (i.e., tried out tasks) of 
AELP for the 5 th grade. The tasks use the reading modality; however, they were drafted 
to measure the academic language construct and not reading comprehension per se. 
That is, the tasks isolate specific language features (e.g., vocabulary, grammar, language 
functions) occurring in different content areas (e.g., mathematics, science, and social 
studies texts). Taken together these features are necessary for reading comprehension in 
the content areas. Indeed, students will need to control all these features in order to 
comprehend information presented in their textbooks. By focusing on the individual 
language features, rather than the subject matter or overall meaning of a text, the AELP 
tasks are designed to help determine whether a student has sufficient antecedent 
knowledge of English language features to be able to comprehend the content of a text. 

The work reported here is the third and final stage of an iterative test development 
process. In previous National Center for Research on Evaluation, Standards, and Student 



1 We gratefully acknowledge the following publishers for permission to use textbook excerpts in the 
CRESST test development process: Harcourt for Math (2002) National Edition, Science (2000) 
California Edition and Social Studies: Early United States (2002) National Edition; Houghton Mifflin for 
Mathematic (2002) California Edition; Science (2000) California Edition, Social Studies: America Will Be 
(1999) National Edition; McGraw-Hill for Math Explorations and Applications (2003) National Edition, 
Science (2000) California Edition, United States: Adventure in Time and Place (2001) National Edition. 



1 




Testing (CRESST) work, we conducted a series of studies to develop specifications and 
create tasks of AELP. Specifically, we first specified the construct by synthesizing 
evidence from linguistic analyses of ELD and content standards, textbooks (mathematics, 
science, and social studies), and teacher talk in classrooms, resulting in language demand 
profiles for the 5 th grade. After determining task format by frequency of assessment 
types in textbooks, we then created draft tasks aligned with the language profiles. 

The goals of the current effort were to take these previously drafted tasks and create 
prototypes by trying out the tasks for the first time with 224 students from native English 
and English language learner (ELL) backgrounds. Students across the 4 lh -6 lh grades, as 
well as native-English students, are included in the studies because native speakers and 
adjacent grades provide critical information about the targeted language abilities of 
mainstream students at the 5 lh grade level. Phase 1 (n= 96) involved various tryouts of 
101 draft tasks to estimate duration of administration, clarity of directions, whole-class 
administration procedures, and an opportunity to administer verbal protocols to provide 
further information about task accessibility and characteristics. Phase 2, the pilot stage, 
involved administration of 40 retained tasks (35 of which were modified as a result of 
Phase 1) to students in whole-class settings (;i=128). Analyses included item difficulty 
and item discrimination. The rationale for retaining or rejecting tasks is presented along 
with psychometric /linguistic profiles documenting the evolution of example effective 
and ineffective prototype tasks. The final chapter of the report reflects on the lessons 
learned from the test development process we adopted and makes suggestions for 
further advances in this area. 

Overview and Outline of the Report 

The work described in the full report is the culmination of several years of 
research at the national Center for Research in Evaluation, Standards, and Student 
Testing (CRESST) that focused initially on articulation of the academic English 
construct in school settings, and finally on the use of that information for the 
development of prototype reading tasks of academic English. Specifically, the 
report presents findings from a series of small-scale try-outs and a pilot study with 
reading tasks designed to assess 5 th grade academic English language proficiency 
(AELP). 

The report begins with a summary of the prior research at CRESST which 
provides the background and context for the AELP task development. The specific 
goals of the task development effort are then outlined. Next, we describe the 
procedures and instrumentation of each of the two phases of administering and 
revising the AELP tasks, followed by analyses of the data collected during in the 
pre-pilot phase and the subsequent pilot phase. 



2 




Six tasks profiles demonstrate how tasks were refined in light of feedback from 
verbal protocols with students and psychometric information on item-level 
performance. Tasks based on reading passages from mathematics, science, and 
social studies content areas are used to illustrate in considerable depth the decision- 
making process for how tasks could be retained without modification, modified and 
retained for piloting, or rejected as unsuitable for further development. The report 
concludes with recommendations for refinement of the research and standards- 
informed test development process and implications for further research in this area. 

Context and Stages of AELP Test Development 

The impetus for this long-term initiative grew out of the need to ensure access 
for all students in evaluation of their academic progress. In the mid to late 1990's, 
the validity of large-scale (standardized) assessments with English language learner 
(ELL) students came into question (August & Hakuta, 1997; Butler & Stevens, 1997, 
2001; LaCelle-Peterson & Rivers, 1994). This concern led to further issues, including 
the use of test accommodations with ELL students (Abedi, 1997; Abedi, Lord, & 
Plummer, 1997; Butler & Stevens, 1997) and the effectiveness of existing language 
proficiency tests for evaluating the English language skills of those students 
(Stevens, Butler, & Castellon- Wellington, 2000; Butler & Stevens, 2001; Bailey & 
Butler, 2002/2003). CRESST research was showing that existing language tests were 
not good predictors of performance on standardized content tests (Butler & 
Castellon-Wellington, 2000/2005). There was a mismatch between the language 
tested on language proficiency tests (every-day vocabulary and simple structures) 
and the language used on content tests and in the classroom (more precise uses of 
vocabulary and complex structures; Stevens, Butler, & Castellon-Wellington, 2000). 
The distinctions between the two are typically characterized as social versus 
academic English, although the distinctions are not always easy to articulate. Since 
both are critical to the student's English language development, educators began to 
recognize the need for expanding the content domain of K-12 English language 
proficiency tests to include academic English. 

The No Child Left Behind Act of 2001, which required that ELL students show 
measurable yearly progress in English language development (ELD), brought the 
language proficiency assessment of ELL students to the forefront of the national 
educational discussion. The need for language tests that focused on academic 
English, or at least included features of academic English in the test content, rapidly 



3 




